gvi posterior
Rates of Convergence of Generalised Variational Inference Posteriors under Prior Misspecification
Mildner, Terje, Giampouras, Paris, Damoulas, Theodoros
We prove rates of convergence and robustness to prior misspecification within a Generalised Variational Inference (GVI) framework with bounded divergences. This addresses a significant open challenge for GVI and Federated GVI that employ a different divergence to the Kullback--Leibler under prior misspecification, operate within a subset of possible probability measures, and result in intractable posteriors. Our theoretical contributions cover severe prior misspecification while relying on our ability to restrict the space of possible GVI posterior measures, and infer properties based on this space. In particular, we are able to establish sufficient conditions for existence and uniqueness of GVI posteriors on arbitrary Polish spaces, prove that the GVI posterior measure concentrates on a neighbourhood of loss minimisers, and extend this to rates of convergence regardless of the prior measure.
Frequentist Consistency of Generalized Variational Inference
This paper investigates Frequentist consistency properties of the posterior distributions constructed via Generalized Variational Inference (GVI). A number of generic and novel strategies are given for proving consistency, relying on the theory of $\Gamma$-convergence. Specifically, this paper shows that under minimal regularity conditions, the sequence of GVI posteriors is consistent and collapses to a point mass at the population-optimal parameter value as the number of observations goes to infinity. The results extend to the latent variable case without additional assumptions and hold under misspecification. Lastly, the paper explains how to apply the results to a selection of GVI posteriors with especially popular variational families. For example, consistency is established for GVI methods using the mean field normal variational family, normal mixtures, Gaussian process variational families as well as neural networks indexing a normal (mixture) distribution.